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Abstract. We present a method for hierarchical image segmentation 
that defines a disaffinity graph on the image, over-segments it into wa¬ 
tershed basins, defines a new graph on the basins, and then merges basins 
with a modified, size-dependent version of single linkage clustering. The 
quasilinear runtime of the method makes it suitable for segmenting large 
images. We illustrate the method on the challenging problem of segment¬ 
ing 3D electron microscopic brain images. 
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1 Introduction 

Light and electron microscopy can now produce terascale 3D images within 
hours [1,2]. For segmenting such large images, efficient algorithms are important. 
The watershed algorithm has linear runtime but tends to produce severe over¬ 
segmentation, which is typically counteracted by pre- and/or post-processing. 
Here we update this classic approach, providing a new algorithm for watershed 
on edge-weighted graphs, and a novel post-processing method based on single 
linkage clustering modified to use prior knowledge of segment sizes. 

The input is assumed to be a disaffinity graph, in which a small edge weight 
indicates that the image voxels connected by the edge are likely to belong to the 
same segment. Our watershed transform works by finding the basins of attraction 
of steepest descent dynamics, and has runtime that is linear in the number of 
disaffinity graph edges. It yields basins similar to those of watershed cuts [3, 4], 
except that plateaus are divided between basins consistently and in a more even 
way. Our post-processing starts by examining the new graph on the basins, in 
which the edge connecting two basins is assigned the same weight as the minimal 
edge connecting the basins in the original disaffinity graph. Then single linkage 
clustering yields a hierarchical segmentation in which the lowest level consists of 
the watershed basins. Each level of single linkage clustering is a flat segmentation 
in which some of the basins are merged. If we only expect to use levels above 
some minimum value T m i n , then it turns out to be equivalent and more efficient 
to preprocess the original disaffinity graph before watershed by setting all edge 


weights below T m i n to a common low value. In another pre-processing step we 
remove the edges with disaffinity to allow for unsegmented regions. 

We also show how to modify single linkage clustering by making it depend 
not only on edge weights but also on cluster size. The modification is useful 
when there is prior knowledge about the size of true segments, and is shown to 
have an efficient implementation because size is a property that is guaranteed to 
increase with each agglomerative step. The runtime of single linkage clustering 
is quasilinear in the number of edges in the watershed basin graph. 

Felzenszwalb et al. [5] and Guimaraes et al. [6] have proposed efficient image 
segmentation methods that are quasilinear in the number of edges in the dis¬ 
affinity graph. We show that our method produces superior results to that of [5] 
for the segmentation of neural images from serial electron microscopy. 

2 Watershed Transform 

Inspired by the drop of water principle [3] we define a steepest descent discrete 
dynamics on a connected edge-weighted graph G = (V,E) with non-negative 
weights. A water drop travels from a vertex to another vertex using only locally 
minimal edges. An edge {u,v} is locally minimal with respect to u if there is no 
edge in E incident to u with lower weight. Starting from a vertex vq the evolution 
of the system can be represented as a steepest descent walk (uo, eo, Vi, ei, V 2 ,...) 
where every edge is locally minimal with respect to V{. A regional minimum M 
is a connected subgraph of G such that there is a steepest descent walk between 
any pair of vertices in M, and every steepest descent walk in G starting from a 
vertex in M will stay within M. A vertex v belongs to the basin of attraction 
of a regional minimum M if there exists a steepest descent walk from v to any 
vertex in M. Note that v can belong to basins of attractions of multiple regional 
minima. In our watershed transform we partition V into basins of attraction of 
the regional minima. Vertices belonging to more than one basin of attraction 
will be referred to as border vertices and will be assigned to one of the basins as 
described below. 

Steepest descent graph. The central quantity in the watershed algorithm 
is the steepest descent graph, defined as follows. Consider an undirected weighted 
graph G (Fig. 1(a)). Define the directed graph G' in which each undirected 
edge of G is replaced by both directed edges between the same vertices. The 
steepest descent graph D (Fig. 1(b)) is a subgraph of G' with the property that 
D includes every edge of G' with minimal weight of all edges outgoing from 
the same vertex. A directed path in D is a path of steepest descent in G. The 
steepest ascent graph can be defined analogously using edges of maximal weight. 
Either steepest ascent or descent can be used without loss of generality. For 
simplicity, for a given vertex v we will refer to its edges in D as incoming, 
outgoing, and bidirectional. A plateau is a connected component of the subgraph 
of D containing only bidirectional edges. A plateau corner is a vertex of a plateau 
that has at least one outgoing edge. Locally minimal plateaus contain no plateau 
corners, they are equivalent to the regional minima of the original graph. Non- 
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Fig. 1 . (a) A disaffinity graph; (b) derived steepest descent graph; (c) locally 
minimal plateaus (black), non-minimal plateau (dark gray), saddle vertex (S), 
plateau corners (C); (d) the two basins of attractions and border vertices (dark 
gray) 


minimal plateaus contain one or more plateau corners. A saddle vertex has more 
than one outgoing edge. In Fig. 1(c) we show locally minimal plateaus (black), 
non-minimal plateau (dark gray), plateau corners (C), and a saddle vertex (S). 

Assigning border vertices. In Fig. 1(d) we show the basins of attraction of 
the two regional minima. The border vertices are shown in dark gray and belong 
to both basins of attraction. Watershed cuts [3, 4] assign border vertices with a 
single constraint that all the basins of attraction have to be connected. We intro¬ 
duce additional constrains. The watershed transform has to be uniquely defined 
and the non-minimal plateaus should be divided evenly. More specifically, we 
want our dynamics to be uniquely defined at saddle vertices, and the vertices of 
the non-minimal plateaus to be assigned to the same basin of attraction as the 
nearest plateau corner - a plateau corner reachable in fewest steps following the 
rules of our dynamics. 



Fig. 2. (a) Vertex indices; (b) distances to the nearest plateau corner; (c) modi¬ 
fications to the steepest descent graph; (d) final watershed partition of the graph 


Watershed transform algorithm. We introduce an ordering function a : 
V {1,2,..., \V\} such that a(u) ^ a(v) if and only if u v. We’ll refer to 
a(u) as the index of u (Fig. 2(a)). In the first part of the algorithm we modify D 
by removing edges. For all saddle vertices we keep only one outgoing edge - the 




one pointing to a vertex with the lowest index. In the next step we divide the 
non-minimal plateaus. We initialize a global FIFO queue Q, mark all the plateau 
corner vertices as visited and insert them into Q in increasing order of their 
index. While Q is not empty we remove the vertex v from the front of the queue, 
we then explore all the bidirectional edges {v,u}. If u is not visited, we mark it 
as such, insert it to the back of the queue and change the edge to be incoming 
(v <— u). Otherwise, if the vertex was already visited we just remove the edge. 
The resulting steepest descent graph is shown on Fig. 2(c) - the dotted edges 
are removed. Considering all the remaining edges as bidirectional, the connected 
components of the modified descent graph D will be the watershed basins of 
attraction. 

The algorithm runs in linear time with respect to the number of edges in 
G and produces an optimal partitioning as defined in [3]. The total number of 
segments in the partitioning will equal to the total number of regional minima. 
We defer the detailed algorithm listing, the proof of correctness and running 
time analysis to the supplementary material. 

Reducing over-segmentation. Noisy values of disaffinities can produce 
severe over-segmentation (Fig. 3(c)). In order to reduce the over-segmentation 
we often merge adjacent segments with the saliency below some given threshold 
T m in [7]. The saliency of two adjacent segments is defined as the value of the 
minimal disaffinity between the vertices of the two segments. That means that we 
are confident that disaffinities below T m i n connect vertices of the same segment. 
An equivalent segmentation can be obtained by replacing the weights of all edges 
in G with the weight smaller than T m i n to a common low value (e.g. 0) before 
applying the watershed transform. We prove this claim in the supplementary 
material. To show confidence about high values of disaffinities , and in order to 
prevent undesired mergers, we introduce a threshold T max by erasing all the edges 
from G with the weight higher than T max , and essentially setting them to oc. 
The T max threshold can produce singleton vertices in G. The singleton vertices 
are not assigned to any basin of attraction and are considered background, which 
is often a desired result. 

3 Hierarchical Clustering of the Watershed Basin Graph 

A hierarchical clustering of an undirected weighted graph treats each vertex as 
a singleton cluster and successively merges clusters connected by an edge in 
the graph. A cluster is always a connected subset of the graph’s vertices. Each 
merge operations creates a new level of the hierarchy - a flat segmentation where 
each cluster represents a segment. In single linkage clustering, each step merges 
two clusters connected by an edge with the lowest weight in the original graph. 
Single linkage clustering is equivalent to finding the minimum spanning tree of 
the graph [8]. 

In this section we propose a size-dependent single linkage clustering. The 
method can be applied to any edge weighted graph, however we find it su¬ 
perior when used on the watershed basin graph defined as follows. Let Vw = 



{Hi, £? 2 , • • •} be the set of watershed basins obtained by the watershed trans¬ 
form of a graph G = (V,H). We define the watershed basin graph of G as 
Gw = (Vw, Ew) where an edge { Bi , Bj} exists in Ew for all neighboring basins 
Bi and Bj , and has the weight w({Bi, Bj}) equal to the saliency of the two 
basins. We will refer to the vertices of the watershed basin graph as basins and 
to the edge weights as saliencies. 

In our size-dependent single linkage clustering method, in each step we merge 
clusters with the lowest saliency that don’t satisfy a given predicate. Saliency 
of two clusters is defined as the minimal saliency of any two members: 


dc u c 2 


min w({Bi, BA) 

B i ec 1 ,B j ec 2 ,{B i ,B j }eE w 


(i) 


At the last level of the hierarchy all pairs of clusters will satisfy the predicate. 

Size-dependent comparison predicate. We define a predicate A, for eval¬ 
uating whether two clusters should be merged. The predicate is based on the 
sizes of the two clusters. Let S(C) represent the size of C (e.g. number of basins 
in the cluster or the sum of the basin sizes). We first define a non-increasing 
threshold function of a cluster size r(s). The value of r(s) represents the max¬ 
imal saliency allowed between a cluster of size s and any adjacent cluster. Our 
predicate is then defined as: 


A(C u C 2 ) 


true if d Cl ,c 2 > r (min (S'(C'i), S(C 2 )}) 
false otherwise 


( 2 ) 


The intuition behind the predicate is to apply prior knowledge about the 
sizes of the true segments. With the threshold function we control the confidence 
required to grow a cluster of a certain size. 

With a slight modification of the predicate we could allow for an arbitrary 
threshold function (changing the condition to dc 1 ,c 2 > min {r(H(Ci)), r(5'(C f 2))}) 
However, restricting the function to be non-decreasing allows us to design a more 
efficient algorithm. It is also more intuitive to allow higher saliency for merging 
small clusters and require lower saliency as the sizes of the clusters grow. As r is 
required to be non-decreasing, we can find a non-increasing function uo such that 
when (2) is satisfied uj(dc 1: c 2 ) < min {H(Ci), S(C 2 )} is satisfied. This allows us 
to either specify either r or w used for the predicate. For example, when uj is 
constant the algorithm will tend to aggressively merge segments smaller than 
the given constant. 

Algorithm 1 In our clustering algorithm we visit all the edges of the water¬ 
shed basin graph in non-decreasing order and merge the corresponding clusters 
based on the introduced predicate. 


1. Order Ew into 7r(ei,..., e n ), by non-decreasing edge weight. 

2. Start with basins as singleton clusters S° = {C\ = {Hi}, C 2 = {H 2 },... } 

3. Repeat step 4 for k = 1,..., n 

4. Construct S k from S k ~ x . Let = {H^, Bj} be the k -th edge in the ordering. 
Let C k ~ x and C k ~ l 2 3 4 be components of S k_1 containing Bi and Bj. If C k ~ l / 




Cj 1 and A(C k C k 1 ) is not satisfied then S k is created from S k 1 by 
merging C k ~ x and C^ _1 , otherwise S k = S fe_1 . 

5. Return the hierarchical segmentation (S' 0 ,..., S n ) 


Theorem 1 The highest level of the hierarchical segmentation produced by al¬ 
gorithm (1) will have the predicate A satisfied for all pairs of the clusters. The 
complexity of the algorithm is \Ew\ log \Ew\ • The algorithm can be modified to 
consider only the edges of the minimum cost spanning tree of Gw ■ 

We defer the proof the supplementary material. 

The steps 2-5 of the algorithm have near linear complexity. Once we have 
a sorted list of the edges we can re-run the algorithm for different threshold 
functions more efficiently. 



Fig. 3. Segmentation of a 256 3 EM image by our method and that of [5] (a) 
slice of the raw image; (b) slice of nearest neighbor disafhnity graph, with xyz 
disaffinities represented with RGB; (c) watershed transform of raw image; (d) 
watershed transform after preprocessing with T m i n = 0.01 and T max = 0.9; 
(e) post-processing with size-dependent single linkage clustering using u)(w) = 
3000 (1 — w); (f, g) [5] with k = 0.5 yields severe oversegmentation while k = 10 
merges neurons, (h) ground truth segmentation from human expert 





4 Results 


We applied our method to 3D electron microscopic brain images [9] (Fig. 3(a)). 
Disaffinity graphs were computed using convolutional networks [10] (Fig. 3(b)). 
The watershed transform produced severe oversegmentation (Fig. 3(c)), which 
was reduced by pre-processing the disaffinity graph with upper and lower thresh¬ 
olds (Fig. 3(d)). Size-dependent single linkage clustering further reduced over¬ 
segmentation (Fig. 3(e)). The first function enforced all the segments to be at 
least some minimal size. The second and the third functions require the minimal 
size of the segment to be proportional to the affinity (or the square of affinity). 




Fig. 4. Scores of our method and that of relative to the ground truth segmenta¬ 
tion (a) Our method with several threshold functions, versus that of applied to 
disaffinity graph and to watershed basin graph. Upper right is better, lower left 
is worse; (b) Segmentation obtained by our method with uj{w) = 3000(1 — w) 


Measuring the quality of the segmentations. We evaluated the segmen¬ 
tations by comparing to the ground truth generated by a human expert. Split 
and merge scores were computed by 

y..p 2 . Y-V 2 - 

Upht = and l/ merge = (3) 

E k t% E k s k 

where Pij is the probability that a randomly chosen voxel belongs to segment i 
in the proposed segmentation and segment j in the ground truth, Si and tj are 
probabilities of a randomly chosen voxel belonging to predicted segment i and 
ground truth segment j respectively. The scores are similar to the Rand index, 
a well-known metric for clustering [11], except that they distinguish between 
split and merge errors. Higher scores mean fewer errors. Scoring was restricted 
to the foreground voxels in the ground truth. We tested our method with several 
threshold functions, and also applied the method of [5] to the disaffinity graph 














and to the watershed basin graph. Our method achieved superior scores (Fig. 
4(a)). The paremeter k in both methods determines the trade-off between the 
amount of mergers and splits. When k in the method of [5] is optimized to have 
approximately the same amount of mergers as our method, large amount of splits 
are introduced (Fig. 3(f)) and vice versa (Fig. 3(g)). 

In conclusion, the runtime of our method makes it very suitable for segment¬ 
ing very large images. It greatly outperforms other methods similar in runtime 
complexity. Our method can greatly reduce the oversegmentation while intro¬ 
ducing virtually no mergers. 
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